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Abstract. Instrumental variables analysis using genetic markers as in¬ 
struments is now a widely used technique in epidemiology and bio¬ 
statistics. As single markers tend to explain only a small proportion 
of phenotypic variation, there is increasing interest in using multiple 
genetic markers to obtain more precise estimates of causal parameters. 
Structural mean models (SMMs) are semiparametric models that use 
instrumental variables to identify causal parameters. Recently, interest 
has started to focus on using these models with multiple instruments, 
particularly for multiplicative and logistic SMMs. In this paper we show 
how additive, multiplicative and logistic SMMs with multiple orthogo¬ 
nal binary instrumental variables can be estimated efficiently in models 
with no further (continuous) covariates, using the generalised method 
of moments (GMM) estimator. We discuss how the Hansen J-test can 
be used to test for model misspecification, and how standard GMM 
software routines can be used to fit SMMs. We further show that mul¬ 
tiplicative SMMs, like the additive SMM, identify a weighted average 
of local causal effects if selection is monotonic. We use these methods to 
reanalyse a study of the relationship between adiposity and hyperten¬ 
sion using SMMs with two genetic markers as instruments for adiposity. 
We find strong effects of adiposity on hypertension. 

Key words and phrases: Structural mean models, multiple instrumen¬ 
tal variables, generalised method of moments, Mendelian randomisa¬ 
tion, local average treatment effects. 


1. INTRODUCTION 

Additive and multiplicative structural mean mod¬ 
els (SMMs) and G-estimation were introduced by 
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Robins (1989, 1994) for estimating the causal effects 
of treatment regimes on outcomes from encourage¬ 
ment designs, namely, randomised controlled trials 
(RCTs) affected by noncompliance. Additive SMMs 
are parameterised in terms of average treatment ef¬ 
fects and multiplicative SMMs in terms of causal 
risk ratios; the G-estimators for these models are 
consistent, asymptotically normal and can be con¬ 
structed to be semiparametrically efficient. Vanstee- 
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landt and Goetghebeur (2003) subsequently devel¬ 
oped a class of estimators for generalised SMMs and, 
in particular, the “double-logistic” SMM for esti¬ 
mating causal odds ratios. Within this literature, 
causal effects among the treated are identified by 
the assumption of no effect modification by the in¬ 
strumental variable (NEM), that is, the causal ef¬ 
fect among the treated is the same at each level of 
the instrumental variable; see, for example, Hernan 
and Robins (2006). Alternative estimators and iden¬ 
tifying assumptions for generalised SMMs have also 
been developed by Robins and Rotnitzky (2004), 
Tan (2010) and, for a closely related class of models, 
van der Laan, Hubbard and Jewell (2007). 

The application of SMMs is not limited to en¬ 
couragement designs, however, and extends to the 
analysis of observational studies using instrumen¬ 
tal variables; see, for example, Hernan and Robins 
(2006). Instrumental variables analysis involves esti¬ 
mating the causal effect of a temporally antecedent 
predictor variable on an outcome using an instru¬ 
mental variable that is associated with the outcome 
only through its association with the predictor. In¬ 
strumental variables analysis has historically been 
a domain of econometrics, but is now frequently 
used within epidemiology and biostatistics. In par¬ 
ticular, genetic markers were proposed as instru¬ 
ments for modifiable risk factors by Katan (1986) 
and Davey Smith and Ebrahim (2003). Epidemio¬ 
logical studies using genetic markers are known as 
Mendelian randomisation studies after the assump¬ 
tion that each individual’s genotype is randomly 
assigned at conception, which implies that the ge¬ 
netic marker is an instrumental variable if it at least 
partly explains variation in the risk factor. In prac¬ 
tice, genetic markers explain only a small proportion 
of phenotypic variation, and so large sample sizes 
are required to obtain any reasonable precision. The 
number of genome-wide association studies has in¬ 
creased as the costs of genotyping have decreased, 
which has led to the identification of multiple ge¬ 
netic variants for the same risk factor. An impor¬ 
tant attraction of using multiple genetic variants as 
instrumental variables is that, potentially, more pre¬ 
cise causal estimates can be obtained. 

Techniques for multiple instruments in linear in¬ 
strumental variables analysis are already in use; 
see, for example, Palmer et al. (2012). For linear 
and nonlinear SMMs, the different frameworks we 
have mentioned are all general enough to incorpo¬ 
rate multiple instrumental variables, but to date the 


focus in applications has mainly been on cases in¬ 
volving a single instrumental variable. The excep¬ 
tions are Bowden and Vansteelandt (2011) and Tan 
(2010). In the first paper, within the frameworks in¬ 
troduced by Robins (1994) and Vansteelandt and 
Goetghebeur (2003), the authors propose a combi¬ 
nation of multiple instrumental variables into a sin¬ 
gle instrumental variable which, they argue, leads 
to an optimally efficient estimator. In the second 
paper, multiple instrumental variables are directly 
incorporated into the estimating equations, within 
an alternative framework that introduces new struc¬ 
tural models together with doubly robust estimating 
equations. 

In this paper, we consider an alternative frame¬ 
work based on the generalized method of moments 
(GMM); see, for example, Hansen (1982) and Newey 
(1993). GMM is widely used in econometrics for 
the estimation of instrumental variables models. 
We show how nonlinear SMMs with multiple in¬ 
struments can be formulated as instrumental vari¬ 
ables models and estimated using GMM. Further¬ 
more, if the instrumental variables result in an over¬ 
identified model, then the Hansen J-test can be 
used to test parametric identifying assumptions like 
NEM. We also argue that GMM has good efficiency 
properties for SMMs without baseline covariates. 
Specifically, GMM is shown to be semiparametri- 
cally efficient in cases where the instrumental vari¬ 
ables can be represented by a set of orthogonal bi¬ 
nary variables, in which case the efficient combina¬ 
tion of the instrumental variables is equivalent to 
that proposed by Bowden and Vansteelandt (2011). 
An important practical advantage of GMM is that it 
can be implemented using existing routines in soft¬ 
ware packages like Stata and R ; see Chausse (2010). 

The focus of our presentation is on SMMs without 
covariates because these models are widely appli¬ 
cable to Mendelian randomisation studies. A draw¬ 
back to fitting SMMs with covariates using our ap¬ 
proach is that the user must correctly specify the 
covariate effects in a model for the counterfactual 
exposure-free outcomes, which cannot be tested for 
misspecihcation. However, if the covariate effects 
are saturated—in the sense that the covariates de¬ 
fine population strata and the SMM has a separate 
parameter for the causal effect in each stratum— 
then this counterfactual model is nonparametric and 
cannot be misspecified, and the efficiency proper¬ 
ties listed above all hold. Saturated SMMs like this 
can be used to deal with population stratification in 
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Mendelian randomisation studies; see, for example, 
Lawlor et al. (2008). Tan (2010) also uses GMM but 
applies it to a very different family of doubly robust 
estimating equations for which the user must spec¬ 
ify the covariate effects in two sets of models; the 
advantage of this approach is that each model can 
be tested for misspecification, and the estimator re¬ 
mains consistent for the SMM parameters even if 
one set of models is misspecified. 

In the second part of the paper, we consider the 
interpretation of additive and multiplicative SMMs 
with multiple instruments when the key NEM as¬ 
sumption fails. In such circumstances, an additive 
SMM with one binary instrument identifies a “lo¬ 
cal” average treatment effect (LATE)—also known 
as a “compiler” average causal effect (CACE)— 
provided that selection is monotonic, and multi¬ 
plicative SMMs identify local causal risk ratios; see, 
for example, Clarke and Windmeijer (2010). When 
there are multiple instruments, Imbens and Angrist 
(1994) show that a GMM estimator for the addi¬ 
tive SMM identifies a weighted average of LATEs. 
We extend their analysis to multiplicative SMMs to 
show that a GMM estimator identifies weighted av¬ 
erages of local risk ratios. 

To demonstrate our findings, we reanalyse data 
from a study of the relationship between hyperten¬ 
sion and adiposity by Timpson et al. (2009). In the 
original study, two genetic markers were used as 
instruments for adiposity and analysed using lin¬ 
ear instrumental variables models. We reanalyse this 
study by focusing on hypertension as a binary out¬ 
come and by estimating causal effects of adiposity 
using multiplicative and logistic SMMs. 

The remainder of the paper is organised as fol¬ 
lows. In Section 2 we review the potential outcomes 
framework and the additive, multiplicative and lo¬ 
gistic SMMs, first for the simple case of a single 
binary instrumental variable and then more gener¬ 
ally. In Section 3 we show how SMMs with a single 
binary instrument can be formulated as an instru¬ 
mental variables model and estimated using GMM, 
and in Section 4 extend this to multiple instrumental 
variables. In Section 5 we discuss how GMM com¬ 
bines multiple instruments efficiently for orthogonal 
binary instruments. In Section 6 we present the re¬ 
sults of a Monte Carlo study for multiplicative and 
logistic SMMs. In Section 7 we derive the multiple 
instruments results for the local risk ratio. Finally, 
in Section 8 we apply our estimation procedures to 
reanalyse the adiposity and hypertension data of 


Timpson et al. (2009), and in Section 9 make con¬ 
cluding remarks. In the Appendix we provide Stata 
and R code for the estimation of the three SMMs 
using GMM. 

2. STRUCTURAL MEAN MODELS 

2.1 The Basic Setup 

To introduce SMMs, we follow the exposition in 
Hernan and Robins (2006) and focus on SMMs for a 
randomised controlled trial where Z,, X % and Y % are 
i.i.d. dichotomous random variables for individual 
subjects i = 1,..., n drawn from the target popula¬ 
tion. For individual i, let Zj be a binary indicator of 
treatment assignment following randomization, A* 
the selected treatment, and 1) the study outcome. 
For notational simplicity the subject index is some¬ 
times suppressed for the random variables. 

The potential outcomes can now be defined in the 
usual way. The potential treatments Xq and X\ are 
the treatments selected by the individual following 
assignment to treatment 2 = 0,1, respectively. Sim¬ 
ilarly, the potential (study) outcome Y xz is that ob¬ 
tained if the individual is assigned to treatment z 
but given treatment x. Using potential outcomes 
notation, we can now state five key conditions that 
must be satisfied for causal inference: (i) the “sta¬ 
ble unit treatment value assumption” that each in¬ 
dividual’s potential treatments and potential study 
outcomes are mutually independent of those for any 
other individual; (ii) the “consistency assumption” 
X = Xz and Y = Yxz that links the observed re¬ 
alisations to the potential outcomes; (iii) the “inde¬ 
pendence assumption”, potential outcomes Y zx are 
independent of Z; (iv) the “exclusion restriction” 
Y xz = Y x ; and (v) “association assumption”, there is 
an association between X and Z. Alternative state¬ 
ments of the key conditions can be found in Robins 
and Rotnitzky (2004) and Tan (2010). 

2.2 SMM Identification 

For the basic setup defined above, the generalized 
SMM of Vansteelandt and Goetghebeur (2003) is 

h{E(Y\X,Z)}-h{E(Y 0 \X,Z)} 

(!) 

= (V>o + ihZ)X, 

where Yq is often referred to as the exposure- 
free potential outcome, and h is the link func¬ 
tion that determines the interpretation of the tar¬ 
get causal parameters Y’o an d V’o + Yh • For exam¬ 
ple, the identity link leads to the additive SMM 
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E(Y\X,Z) - E(Y 0 \X,Z) = {i’o + Vh Z)X, where 
V>o = E(Y 1 - Y 0 \X = 1, Z = 0) and V’o + Yd = E{Y 1 - 
Y 0 \X = Z = 1) are both average treatment ef¬ 
fects; the log link leads to the multiplicative SMM 
E(Y\X, Z)/E(Yq\X, Z ) = exp{(V> 0 + ^i Z)X}, where 
exp(^o) = E(Y 1 \X = 1 ,Z = 0)/E(Y 0 \X = 1,Z = 0) 
and exp(V>o + </h) = E(Y x \X = Z = l)/E{Y 0 \X = 
Z = 1) are causal risk ratios. 

The SMM parameters are identified by exploiting 
the conditional mean independence (CMI), or ran¬ 
domisation, assumption 

(2) E(Y 0 \Z)=E(Yq), 

which follows automatically from the key condi¬ 
tions on Z specified above. For the additive SMM, 
h is the identity link and E(Yq\Z) = E{Y — (f/io + 
ipiZ)X\Z}; and for the multiplicative SMM, h = log 
and E{Y 0 \Z) = E[Yex.p{-(^ 0 + ipiZ)X}\Z\. How¬ 
ever, the CMI assumption (2) alone does not identify 
ipo and i/j i; for instance, in this simple setup, CMI 
implies the single independent moment condition 

E{Y-(^ + ^ 1 )X\Z = l} 

( 3 ) 

= E(Y -i/j 0 X\Z = 0), 

under the additive SMM. In other words, there is 
one moment condition with two unknowns. Hence, 
we must impose dimension-reducing constraints on 
the SMM. Hernan and Robins (2006) highlight the 
importance of no effect modification by Z (NEM), 
which constrains ipi=0 in (3) and identifies ■j/’o - Un¬ 
der NEM, the parameter 'ipo of the additive SMM 
can be interpreted as E(Y\ — Yq\X = 1), that is, the 
average causal effect among the treated; and the pa¬ 
rameter exp( , 0o) °f the multiplicative SMM can be 
interpreted as E(Y\\X = \)/E{Yq\X = 1), that is, 
the causal risk ratio among the treated. 

Generally, the form of E(Yq\Z) is more complex 
than for the additive and multiplicative SMMs be¬ 
cause the inverse link function h~ l is not separable. 
Specifically, for the additive SMM, h = h~ l is the 
additively separable identity function [i.e., h“ 1 (a + 
b) = h~ 1 (a ) + /i -1 (fe)]; and for the multiplicative 
SMM, h = log so that h~ l = exp is multiplicatively 
separable [i.e., h~ 1 (a + b) = h~ 1 (a) x /i _1 (6)]. For 
nonseparable hY 1 , however, CMI and NEM do not 
alone identify the parameters of SMMs. For exam¬ 
ple, the logistic SMM 

logit{E(Y\X, Z)} - logit{E(y 0 |y, Z)} 

( 4 ) 

= Wo + il>iZ)X, 


where logit (p) =log{p/(l — p)} and the parameters 
exp(^o) and exp(^o + ^ 1 ) are causal odds ratios for 
the (X,Z) = (1,0) and (1,1) groups, respectively; 
assuming that CMI and NEM hold, 

E(Y 0 \Z) = E [expit{logit (E(Y\X, Z)) 

( 5 ) 

~^oX}\Z\, 

where expit (a) = exp(a)/{l + exp(a)} is the nonsep¬ 
arable inverse logit function. It is clear that ipo is 
not identified unless E(Y\X,Z) is known; see, for 
example, Robins (2000). Hence, to identify tpo, it is 
necessary to specify an association model 

(6) h a {E(Y\X,Z)} = m /3 (X,Z), 

where h a is its link function and mp(X,Z) its lin¬ 
ear predictor. Vansteelandt and Goetghebeur (2003) 
specify the double-logistic SMM such that h a = h = 
logit, where the SMM parameters are identified by 
the conditional moment conditions 

E[expit{mp(X, Z) — x^qX}\ Z = 0] 

= E[expit{m i a(X, Z) — ipoX}\Z = 1], 

E[Y-expit{mp(X,Z)}\X,Z} = 0, 

provided that the association model is correctly 
specified. A saturated association model is mp{X , 
Z) = A) + PiX + 02Z + 03XZ for the simple setup 
considered here, and is nonparametric in the sense 
of placing no constraints on the distribution of the 
observed data. However, nonsaturated logistic as¬ 
sociation models are potentially uncongenial to the 
logistic SMM and hence misspecified; see Robins 
and Rotnitzky (2004). Robins and Rotnitzky (2004) 
propose an estimator that solves this problem, but 
Vansteelandt et al. (2011) argue that the impact of 
an uncongenial association model will be small in 
practice. 

As highlighted by Vansteelandt and Goetghebeur 
(2005) and Tan (2010), for more general scenarios 
where any or all of X, Z and Y are nonbinary, NEM 
is not the only identifying assumption for SMMs. 
For example, if Z has three categories and X is bi¬ 
nary, then CMI implies 3 independent moment con¬ 
ditions, and so the model can be identified if it is 
correct to assume that Z has a linear effect and the 
SMM is + ijjiZ)X, which identifies both SMM 
parameters without needing to assume NEM. 
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2.3 Estimating Equations 

The construction of consistent estimating equa¬ 
tions requires the specification of suitable uncondi¬ 
tional moment conditions based on the conditional 
moment conditions introduced above. The estimat¬ 
ing equations are sample analogues of these uncondi¬ 
tional moment conditions, and the different estimat¬ 
ing approaches in the SMM literature differ in how 
these unconditional moment conditions are speci¬ 
fied. We first consider estimating equations for sim¬ 
ple scenarios involving only binary variables, before 
moving on to the more general case. 

Robins (1994) derived G-estimation for additive 
and multiplicative SMMs. The G-estimator is based 
on an unconditional moment condition of the form 

(7) E[{Z-E(Z)}E(Y 0 \Z)}=0, 

which holds under (2). As shown above, for SMMs 
with separable inverse link functions, we can write 
E(Yq\Z) = E{h*(X,Y-,il>o)\Z}, where h* is deter¬ 
mined by the SMM and NEM is taken to hold. Thus, 
the sample analogue of (7) is 

n 

( 8 ) n- 1 Y,{Zi-E(Z)}h*(X i ,Y i ^ 0 )=0, 

i =1 

where, for example, h*(X, Y ; i/jq) = Y — iI’qX for the 
additive SMM and h*(X, Y; r tpo) = Yexp(-ipoX) un¬ 
der the multiplicative SMM. Under regularity con¬ 
ditions, i/jq is a consistent estimator for under 
CMI provided that (a) the SMM is correctly spec¬ 
ified and (b) E{Z) is known. The second of these 
conditions will be satisfied if Z is based on a known 
allocation rule such as randomisation. Otherwise, if 
E(Z) is unknown, we must specify a (trivial) model 
E(Z) = fjL and replace E{Z) in (8) with )u, that 
is, a consistent estimator of /j. Robins, Mark and 
Newey (1992) note that the correct asymptotic co- 
variance matrix for i/jq can only be derived from an 
extended system of moment conditions that includes 
E(Z — n) = 0; see also Vansteelandt and Goetghe- 
beur (2003) and Tan (2010). Conversely, treating 'll 
as known when deriving the asymptotic variance of 
ijjo leads to an expression that is too large, and re¬ 
sults in conservative inferences; see Robins, Mark 
and Newey (1992) and Vansteelandt and Goetghe- 
beur (2003). 

The estimating equations for the double-logistic 
SMM are 

n 

(9) n _1 ^(Zi - fj.) expit {rrip(X, Z) - -i/jqX} = 0, 

2—1 


where p = E(Z) as before. Due to the nonsepara- 
bility of the expit function, the estimating equation 
involves the association model logit{E(V|X, Z)} = 
mp{X , Z). As with //, we must replace (3 in (9) with 
a consistent estimator (3, and the correct asymptotic 
covariance can only be derived from a set of moment 
conditions that includes ones for mp(X,Z) as well 
as for p. Conservative inferences again result if (3 
is treated as known when deriving the asymptotic 
covariance matrix. 

More generally, for models involving multiple or 
continuous instrumental variables, the estimators 
above are based on unconditional moment condi¬ 
tions of the form 

(10) E[{d(Z)-p d }E(Y 0 \Z)\ = 0, 

where E(Yq\Z) is determined by the SMM, d(Z) is 
a user-specified function, and p d = E{d(Z)}. The 
choice of d(Z) does not affect consistency but does 
affect efficiency. Robins (1994) derives the choice of 
d(Z) = d 0 pt{Z ) for the additive and multiplicative 
SMMs so that the first-order asymptotitc variance 
is minimised and the estimator is semiparametri- 
cally efficient; Vansteelandt and Goetghebeur (2003) 
derive the equivalent choice for the double-logistic 
SMM. For further details see, for example, Tsiatis 
(2006) and Bowden and Vansteelandt (2011). 

2.4 Covariates 

In this paper we focus mainly on SMMs that do 
not condition on baseline covariates, but for com¬ 
pleteness we discuss here the estimation of SMMs 
which do include covariates; the treatment of covari¬ 
ates is discussed further in Section 9. A generalised 
SMM with baseline covariates C has the form 

h{E(Y\X, Z. C)} - h{E(Y 0 \X, Z. C)} 

= vA x ,z, c), 

where t/i is the SMM parameter vector and ip/,(X, Z, 
C) must satisfy 77 ^( 0 , Z, C) = 0. If /V 1 is nonsep- 
arable, then the association model is specified as 
h{E(Y\X, Z. C)} = mp(X, Z. C). In terms of identi¬ 
fying assumptions, CMI is now conditional on base¬ 
line C such that 

E(Y 0 \Z,C)=E(Y 0 \C), 

where NEM corresponds to ^(X, Z. C) = 77 ^,(V, C) 
and alternative dimension-reducing parametric con¬ 
straints are discussed by Vansteelandt and Goetghe- 
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beur (2005) and Tan (2010). Finally, the uncondi¬ 
tional moment condition (10) on which the estimat¬ 
ing equations are based becomes 

E[{d(Z, C) - p d (C)}E(Y 0 \Z, C)] = 0, 

where E(Yq\Z,C) is determined, as before, by the 
SMM, E(Y 0 \Z, C) = d(Z, C) is a user-specified func¬ 
tion, and fAd( C) = E{d(Z, C)|C}. Consistency thus 
depends on correctly specifying the conditional dis¬ 
tribution of Z given C so that /^(C) is correct for 
given d. Robins (1994) and Vansteelandt and Goet- 
ghebeur (2003) derive the optimal choices of d for 
additive, multiplicative and double-logistic SMMs 
when Pr (Z = z |C) is presumed to be known; see 
also Bowden and Vansteelandt (2011). 

An important special case for Mendelian randomi¬ 
sation studies is where there are discrete baseline co¬ 
variates to handle population stratification; see, for 
example, Lawlor et al. (2008). The generalized SMM 
with saturated covariate effects can be written 

h{E(Y\X, Z,C = c)} - h{E(Y 0 \X, Z,C = c)} 

= Xip c , 

where NEM is taken to hold, and ip c is a unique 
parameter for the population in the stratum de¬ 
fined by C = c. Saturated models of this form are 
equivalent to specifying separate no-covariate SMMs 
within each stratum. Therefore, it can be shown that 
all of the results in this paper regarding no-covariate 
SMMs also apply to saturated-covariate SMMs; see 
also Angrist and Inrbens (1995), Theorem 3. 

Tan (2010) develops an alternative family of dou¬ 
bly robust estimating equations specifically for gen¬ 
eralised SMMs with nonseparable inverse link func¬ 
tions that include continuous covariates. Further¬ 
more, he allows for the inclusion of an extended 
set of covariates V that includes C so that addi¬ 
tional covariates predictive of Z, X and Y can be 
incorporated. The analyst first chooses a working 
distribution p*(z |c) for Pr (Z = z\C = c) that is ar¬ 
bitrary and so does not have to be correct. The ana¬ 
lyst must then specify two sets of parametric models 
involving the full covariates V: (a) Pr(Z = z\Y = 
v) = k\(z\v)] and (b) Pr(V = x\Z = 2 ,V = v) = 
g a (x\z,v) and E(Y\X, Z,V) =m* v {X, Z,V). Using 
the law of iterated expectations, it can be shown 
that the following estimating equation is consistent 
for if either model (a) or model (b) are misspeci- 


fied (but not both): 



P*{Zi |C0 
k^ZilVi) 



P*(Zi\Ci) 

%(^|V0 




= 0, 


where = V - h 1 {mp(X i , Z u C*)} + 

h Zi, Cj) T]^p (Aj, Zi, Cj)}, 


W = Y / 9a(x , \Z i ,V i )[h-\rnUx',Z i ,V i )} 

x' 



Yi\, 


is an estimator of E(A^|Z, V), and E * z | C=c (-) = 
Yhz'P*i z> |c)(-) if Z is discrete. Three important fea¬ 
tures to note are that AL - — Ys does not depend 

on Yi, A^,^ is the key to identification because 
EiEi^JZuV^Ci} = E{Y i0 \Z u Ci), and, while 
p* does not need to be correctly specified, one must 
construct cj) 1 = d(Zi, Cj) — /r*(Cj) for user-specified d 
where p*(C) = E* z ^ c {d(Z, C)}. Tan (2010) also con¬ 
siders other doubly robust estimating schemes and 
argues that the estimator based on the estimating 
equations above is locally efficient given the ana¬ 
lyst’s choices of p* and d. 


3. THE GENERALISED METHOD OF 
MOMENTS 


In this section we propose an alternative approach 
to constructing estimating equations based on the 
generalized method of moments (GMM). Hansen 
(1982) proposed GMM for moment-condition mod¬ 
els of the form E(g(S)) = 0, where g(<5) is a random 
vector and a function of parameter (5, and 0 is an ap¬ 
propriately dimensioned column vector of zeros. A 
general expression for the GMM estimator is given 
by 


(ll) 


S = argmin< n 1 g [(d) > 
5 l i=i ) 

•W^jn-^g^)), 


i= 1 


where gj(<5) is the random vector for subject i, gj(<5) 
is its transpose, and W n is a user-chosen weight- 
matrix that determines the efficiency of the estima¬ 
tor. Tan (2010) has applied the theory of GMM to 
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the doubly robust estimating equations discussed in 
the previous section, but the focus here is on its use 
in econometrics for instrumental variables models of 
the form 

(12) g( 6 ) = v(d)S, 

where v(S ) is known as the generalized residual and 
S is a random vector of instrumental variables. The 
generalized residual is so called because it satisfies 
F7(r>(<5)|S) = 0. We show how any nonlinear SMM 
can be expressed as an instrumental variables model 
by exploiting that E{Yq — E(Yq)\Z} = 0 under CMI 
(2) and by developing estimating equations which 
are sample analogues of 

E[d(S)E{Y 0 -E(Y 0 )\S}} = 0, 

where d{ S) is a user-specified function that affects 
efficiency but not consistency. The choice of d( S) 
that minimises the variance of the GMM estimator, 
the so-called efficient instrument, depends on W n 
and will be discussed further on. 

In our simple scenario involving only binary vari¬ 
ables, the SMM is just identified in the sense that it 
has one parameter and one moment condition under 
CMI (for now taking (3 to be known for the double- 
logistic SMM). For example, the additive SMM un¬ 
der NEM leads to the well-known estimator 

? E(Y\Z = 1)-E(Y\Z = 0) 

{ ’ E(X\Z = 1)-E(X\Z = 0)' 

in this case, namely, the classical instrumental vari¬ 
able estimator; see, for example, Hernan and Robins 
(2006). Theory based on the GMM estimator (11) 
is not needed here because -00 is simply the solution 
to (3) under NEM, and the choice of d{ S) is irrele¬ 
vant because Z is binary. However, we can use this 
simple example to show how the additive SMM can 
be specified as an instrumental variables model. 

First, the CMI moment condition can be written 
as E{Yq\Z = z) — «o = 0 for z = 0,1, where E(Yq) is 
simply treated as an extra parameter ao and results 
in the additional moment condition E(Y$) — ao = 0. 
It follows that one of E(Yq\Z = z) — ao = 0 is redun¬ 
dant because Z is discrete and E{E(Yq\Z)} = ao by 
definition. However, using the additional E{Yq) — 
ao = 0 moment condition allows the system of mo¬ 
ment conditions to be expressed in terms of a gener¬ 
alised residual and a vector of instrumental variables 
as in (12). For example, under the additive SMM, it 


follows that 


(14) 


E(Y - fox) - a 0 
E{Y — i/}qX\Z = 1) — ao ' 

F \ y - i’oX - ao 

(Y-i!>oX-ao)Z 




that is, E{g(^o,ao)} = 0, where g(V>o,«o) = (Y ~ 
ipoX — «o)S and S = (1, Z)'. Similarly, for the mul¬ 
tiplicative SMM, it follows that 


Y exp(-V’o^) - a 0 
{Y exp(-V’o^) - olo}Z 



and for the double-logistic SMM with a saturated 
association model, 


(16) 


' expit (/3 0 + (3iX + /3 2 Z 

+ foXZ-i>oX)-ao 
{expit (/? 0 + P\X + P2Z 

T P 3 X Z ipoX) a Q jZ_ 



The estimators for these three models are trivial 
special cases of GMM because each is just identified, 
but it is clear that moment conditions (14)-(15) are 
of the form E(v(S)S) = 0, where 0 is an appropri¬ 
ately dimensioned vector of zeros. It is also clear that 
moment condition (16) for the double-logistic SMM 
has the more complicated form E{g(S;/3)j = 0, be¬ 
cause the vector of association model parameters (3 
is usually unknown. We now discuss what happens 
when S is expanded to include multiple instrumen¬ 
tal variables. 


4. MULTIPLE INSTRUMENTS 

Mendelian randomisation studies justify the use of 
genetic markers as instrumental variables by arguing 
that (a) the random allocation of genes from parents 
to offspring mimics a randomised experiment, and 
(b) there is an established relationship between the 
marker and some modifiable risk factor of interest; 
see, for example, Katan (1986), Davey Smith and 
Ebrahim (2003) and Lawlor et al. (2008). 

The genetic variant typically has three forms: ho¬ 
mozygous for the common allele; heterozygous; and 
homozygous for the rare allele. If we code these 0, 1 
and 2, respectively, then the resulting instrument Z 
is multivalued. In fact, this is a simple multiple in¬ 
struments example because the three-level variable 
can be coded using two orthogonal binary variables, 
for example, Z\ = I(Z = 1) and Z 2 = I{Z = 2), 
where I is the indicator function. 











8 


CLARKE, PALMER AND WINDMEIJER 


4.1 Additive SMM 


and 


The additive SMM for multiple instruments in this 
case can be written as 

E(Y\X,Z 1 ,Z 2 )-E(Y 0 \X,Z 1 ,Z 2 ) 

= (V’o + ipiZ 1 + i/j 2 Z 2 )X, 

where NEM corresponds to constraining Vh = ^2 = 
0 and CMI yields the moment conditions 

( E{Y-^ 0 X-ao) ) ( 0\ 

< EiX-ifoX-00^ = 1) > = 0 , 

{E(Y-^ 0 X-a 0 \Z 2 = l)j \0j 

where a® = E{Yq) as before. The unconditional mo¬ 
ment condition is 

E{(Y-^ 0 X-a 0 )S} = O, 


(19) 


f y-exp(q*+X^o) g 1 = Q 

1 exp(aj + Xi/j 0 ) / 


where (19) is obtained simply by dividing (18) by 
exp(«Q) 0. Moment condition (19) is the same as 
that for exponential-mean models proposed by Med¬ 
ially (1997). 

For example, consider a GMM estimator based 
on moment condition (17). The GMM estimator for 
5 = (ao,i/’o)' is the solution to (11) with g(<J) = 
{Y exp(— Xi/jq) — ao}S- The one-step GMM estima¬ 
tor (5i is obtained by choosing the weight matrix in 
(11) to be W n = ra” 1 ^ i SjS'. The two-step GMM 
estimator S 2 is obtained by estimating the weight 
matrix 


where S = (l,Zi,Z 2 )' is a random vector represent¬ 
ing the multiple instruments; note that S is or¬ 
thogonal because its elements are mutually exclu¬ 
sive such that SS' = diag(S). In fact, this model 
is linear and so the parameters can be consistently 
estimated using standard Two-Stage Least Squares 
(2SLS). The 2SLS estimator can be obtained as the 
ordinary least squares (OLS) estimator from regress¬ 
ing Y on X, where X is the prediction from the first- 
stage regression of X on S. The 2SLS estimator is 
a special case of a “one-step” GMM estimator with 
W n = n -1 (see next section), and is com¬ 

monly used for linear instrumental variables analysis 
with multiple instruments; see Palmer et al. (2012) 
for its use with Mendelian randomisation studies. 

4.2 Multiplicative SMM 

The saturated multiplicative SMM for the two in¬ 
struments is 


E(Y\X,Z 1 ,Z 2 )/E(Y 0 \X,Z 1 ,Z 2 ) 

= exp {(^0 + + ip 2 Z 2 )X}, 


where NEM here corresponds to ipi=ip 2 = 0. Using 
the same vector of instrumental variables S, the mul¬ 
tiplicative SMM moment conditions can be written 
as 


(17) 


E 


Y 

exp(XV’o) 


— «o 


S 


= 0 . 


Letting «q = log(ao), it is easy to show that (17) 
also implies 


(18) 


E 


Y — exp(«Q + X 'lpp) t 
exp(Xip 0 ) 


S )■ =0 


n 

Wn(S{) =n _1 ^g ? ;(di)g'(di), 
i —1 

using the one-step GMM estimator 5i. Under stan¬ 
dard regularity conditions, the limiting distributions 
of the one-step and two-step GMM estimators are 

n 1/2 (^i - <5 0 ) A N{ 0, (C'oW-'CoT'CoW- 1 

■ noW-'Co^w-'Co)- 1 }, 

n 1/2 (d 2 - S 0 ) A N{ 0, (GqIIq 1 C' 0 )” 1 }, 

respectively, where S q is the true parameter value, 
—indicates convergence in distribution, N indi¬ 
cates a normally distributed random vector, 

= n 0 = E{g(6 0 )g'(S 0 )}, 

and W = E(SjS() is the probability limit of the one- 
step GMM estimator’s weight matrix. 

Chamberlain (1987) shows that the two-step 
GMM estimator is semiparametrically efficient when 
the instruments are mutually exclusive indicators 
that follow a multinomial distribution, as is the case 
in this example provided that there are no contin¬ 
uous covariates or instruments. More generally, as 
will be discussed in Section 5, one must derive the 
efficient instrument d( S) = d op t(S) for the GMM es¬ 
timator to be semiparametrically efficient. 

A useful property of two-step GMM for over¬ 
identified models is that it admits the use of the 
Hansen J-test, which can be used to assess the va¬ 
lidity of the moment conditions; see Hansen (1982). 
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The test statistic and its limiting distribution (un¬ 
der the null hypothesis that the moment conditions 
are valid) are given by 


Ah) 


n< n 


l i=1 ) 

W'n 1 (^l){ n_1 5^gi(^2) 


i =1 



x 


2 


where Xq indicates a chi-squared random variable 
with q degrees of freedom, and q is the number of 
moment conditions by which the model is over iden¬ 
tified (e.g., q = 1 in this illustration). 


Renault (1996). The 2SGMM 8 \ y p is the solution to 
(11) and its asymptotic distribution is 

n 1/2 (Si )/3 - <5 0 ) 

A N{ 0, (C^WCo^CoWniWCoiC'oWCo)- 1 }, 

where Co and W are both defined as above, and fig 
is the asymptotic variance of the limiting normal 
distribution of 

n_1/2 XZ S*($o; /3o) + |n 1/2 (3 - /3 0 ), 

which has the consistent estimator 

= + G' p V@)Gp 

i =1 


4.3 Double-Logistic SMM 

Under NEM, the logistic SMM for the two instru¬ 
ments is 

logit {E(Y\X, Zt, Z 2 )} - logit{E(Y 0 \X, Zi, Z 2 )j 
= ip 0 X, 

and its association model is 

(20) E(Y\X,Z l ,Z 2 ) = expit{m /3 (X,Zi,Z 2 )}, 

where rrip(X, Zi,Z 2 ) = Po + PiX + f3 2 Zi + (3 3 Z 2 + 
P^XZi + P 5 XZ 2 is saturated. We describe two esti¬ 
mation methods: first, where the parameters in the 
saturated association model are estimated by maxi¬ 
mum likelihood and then plugged into the estimat¬ 
ing equations for the double-logistic SMM; and, sec¬ 
ond, where all parameters are estimated jointly in 
a similar manner to that proposed by Vansteelandt 
and Goetghebeur (2003) and Bowden and Vanstee¬ 
landt (2011). 

Denoting (3 as the maximum likelihood estimator 
of /3, it follows that 

(21) E{g(S-, 3 )} = E[{q(^ 0 - 3 ) - « 0 }S] = 0, 

where S = (ip 0 ,a 0 y, q(ipo](3) = expit{m j g(X, Z x , 
Z 2 ) — Xipo} and S = (1, Z\, Z 2 ) f . Point estimation is 
carried out exactly as before, but standard error es¬ 
timates obtained by fixing (3 and plugging it into the 
asymptotic covariance matrices presented above will 
be biased because the first stage estimation of (3 is 
ignored; see the discussion in Section 2.3. However, 
theory for “two-stage” GMM estimators (2SGMM) 
has been developed by Gourieroux, Monfort and 


+G , pvmY J Qi^i 


\ 2—1 


+ l^Qigrt )V0)Gp, 


\i=l 


with g i = gi(S\ p; 3 ), Gp = Ei dg'(di )/3 ; P)/d(3, 
V0) = (EM 1 -jJRiRli)- 1 , R, = (1 ,Xi,Z lit 

Z 2l .X,Z\i,X l Z 2t y, ^ = expit {m^(Xi, Zn,Z i2 )} and 

Qi = Yi — pi. Furthermore, 12* is also the weight 
matrix for the asymptotically efficient two-step 
2SGMM estimator, and so the limiting distribution 
of the Hansen J-test statistic (with W n = 12*) is also 
valid. 

Vansteelandt and Goetghebeur (2003) developed 
estimating equations for the double-logistic SMM by 
expanding its system of estimating equations to in¬ 
clude those for the association model. As in Bowden 
and Vansteelandt (2011), a joint GMM estimator 
can be obtained by applying the GMM estimator to 


( 22 ) 


g(<5;/3) 


( [V- expit {mp(X,Z 1 ,Z 2 )}]R \ 

v [expit { m p(X, Zi,Z 2 )~ i’ 0 X - a 0 }]Sy ’ 


where R is defined above and <5 = (ao,ipo)'- Gourie¬ 
roux, Monfort and Renault (1996) show that the 
asymptotic distributions of the 2SGMM and the 
joint GMM estimators are the same. An impor¬ 
tant advantage of using the joint moments (22) is 
that standard GMM software can be used to make 
asymptotically correct inferences about the target 
parameter ipQ. Further details on how the gmm com¬ 
mand in Stata and the gmm () function in R can be 
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used to implement these estimators are given in the 
Appendix. 


5. COMBINING MULTIPLE INSTRUMENTS 


Bowden and Vansteelandt (2011) derive the op¬ 
timally efficient combination of instruments and, 
for practical purposes, a simplified expression for 
this combination. We consider the particular case 
of SMMs without covariates where identification is 
obtained using orthogonal binary instruments. In 
such cases, we show that the one-step GMM es¬ 
timator combines the instruments as in Bowden 
and Vansteelandt (2011) under the simplifying as¬ 
sumption of a constant variance, and that the two- 
step GMM estimator combines the instruments op¬ 
timally. 

First consider the one-step GMM estimator by 
noting that it is the solution to the first derivative of 
(11) evaluated at zero. For the multiplicative SMM 
based on (17), this gives 


'"Ef 1 k." 


1=1 


i=l 


= < n 


i=l 


l 

YiXi exp(—Vj^o) 


S' 


w n A n 1 j^gi($) \ =0, 


2—1 


where gj(5) = {Y exp(— Xipo) — ao}S. This system 
can be expressed as 

B , S{S'S)~ 1 S'v = 0, 


where B = {b'} and S = {S'} are the matrices 
formed by stacking the vectors b( = (1, Y % X % x 
exp(— Xiipo)) and S', respectively, and v = {?;,;} 
is a column vector with elements given by v x = 
Yj exp(—Xj'0o) — a 0 . It is thus apparent that the 
GMM estimator combines the instruments in the 
projection S{S' S') -1 S'B , that is, the multiple in¬ 
struments for each individual are replaced by the 
linear projection of bj onto the space spanned by S; 
alternatively put, the combined instrumental vari¬ 
able can be thought of as the prediction from a linear 
regression of bj on the instruments Sj. 

For the binary variables case considered here, we 
have that 


so that the one-step GMM can be thought of comb¬ 
ing the instruments simply using the linear projec¬ 
tion of YX onto the space spanned by S. The one- 
step GMM estimator for the double-logistic SMM 
estimator also has the form of a linear projection 
of bj onto the space spanned by S, but here b' = 
(l,gi(-0o;/3){l ~ qi{^o',P)}Xi). For both the multi¬ 
plicative and logistic SMMs, these are the simpli¬ 
fied combinations of multiple instruments of Bow¬ 
den and Vansteelandt (2011). 

In the simple setup involving only binary vari¬ 
ables, the one-step GMM estimator for the multi¬ 
plicative SMM can be expressed as a linear 2SLS 
estimator. Following Angrist (2001), note that 
exp(— 'ipoX) = (1 — X) + X exp(—i/>o) and, therefore, 

Y exp(— ipoX) — ao = F(l—X)+FAexp(—^o) —ao- 

Hence, the moment conditions can be expressed as 
the linear [in exp(—^o)] moments 

(24) E[{Y( 1 - V) + YX exp(—?/>o) - a 0 }S] = 0, 

from which we see that the one-step GMM estimator 
for exp(— ipo) using moment condition (17) is identi¬ 
cal to the 2SLS estimator from regressing Y(X — 1) 
on YX, where YX are the predictions from the lin¬ 
ear regression of YX on S. 

Multiplying (24) by the risk ratio exp(^o)) we ob¬ 
tain 

(25) E[{YX + Y( 1 - V) exp(^o) - 7 o}S] = 0, 

where 70 = ao exp(^o)- In this case, the same esti¬ 
mator as the one-step GMM estimator for exp(^o) 
is obtained from a linear instrumental variable esti¬ 
mator where (X — 1)Y is instrumented by YX. We 
will use this result later in Section 6 when deriving 
results for local risk ratios. 

We now move on to the optimal combination of in¬ 
struments. As we discussed at the end of Section 4.2, 
Chamberlain (1987) established efficiency results for 
GMM estimators. We describe these results in terms 
of a simple multiplicative SMM and its three mo¬ 
ment conditions 

(26) exp(— X'i/jq) — o.q\Z = z} = 0, 

for z = 0 , 1 , 2 . As shown previously, the instruments 
can be represented by the vector of orthogonal bi¬ 
nary instruments S and the generalized residual 


(23) VjVj exp(-A'j^o) = Y t X x exp(-^o) 


v(Y,X;5 0 ) = Y exp(-VV’o) - a 0 
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where 5o = (ao,V’o) / - Using the notation of Newey 
(1993), the efficient instrument is 

(27) d op t(S) = Qe(S)Ax 2 (S), 

where Q is any nonsingular matrix, 

dv(Y,X-5 0 ) 


e(S ) = E 


d5 


(e{Y exp(-X^ 0 )X\S}) ’ 
a 2 (S)=E{v 2 (Y,X;6 0 )\S} 

= E[{Y exp(-X^o) - « 0 } 2 |S], 


which leads to a GMM estimator with asymptotic 
covariance 


A = [U{e(S)e(S)7a 2 (S)}]- 1 . 


Chamberlain (1987) showed that, when S com¬ 
prises multinomially distributed multiple orthogo¬ 
nal binary instruments such that SS / = diag(S), 
the asymptotic covariance of the two-step GMM 
(CqUq 1 C'o) _1 = A. Hence, we can derive the opti¬ 
mum combination of instrumental variables from the 
first-order condition for the two-step GMM estima¬ 
tor: 


^ dS 

1=1 


wT(«i) 



E 

< 7=1 


Uexp(-X j '0 o )77j 


S' 


As before, let the matrices S and B be defined as 
B = {b'l and S = {S'}, obtained by stacking the 
vectors b( = (1,Yiexp(— Xiipo)Xi) and S', respec¬ 
tively, then the way the two-step GMM estimator 
combines the multiple instruments is given by 

SW-^S^S'B 

= Sdiag(E E AYuXiti)) 1 (S'S)- 1 S'B, 

\ z i,Zi=z ) 

which is a consistent estimate for the optimal instru¬ 
ments. Chamberlain (1987) further showed that A is 
also the lower bound for the asymptotic variance of 
any consistent asymptotically normally distributed 
estimator of a semiparametric model where the only 
substantive restriction imposed on the distribution 
of the data is (26). 

6. MONTE CARLO STUDIES 
6.1 Multiplicative SMM 

We now present two Monte Carlo simulation stud¬ 
ies to demonstrate the properties of GMM estima¬ 
tors with multiple orthogonal binary instruments in 
models without covariates. First, we consider the 
multiplicative SMM by generating data from pop¬ 
ulation model M\ , which satisfies the multiplicative 
SMM under both the NEM and CMI restrictions. 
Population model M\ is defined so that 

E(Y\X, Z U Z 2 ) = exp{/3 0 + (ft + i’o)X + 


where g i(S) = {Y)exp(-X i V’o) - cto}Sj, S, : = (Z io , 
Z t \, Z l2 y with Zij = I(Zi= j ), and 

W n (S i) 

n 

= y^(l}exp(-X;-0i) - Si) 2 SjS' 

7—1 

7 

0 ^Za^iY^Xi-X) 

i 

0 0 

0 

0 

^zW^Xi-yi) 


+ foZ 2 + p 4 XZ 1 +p 5 XZ 2 }, 

where 7o = 0.6 is the treatment effect. To define 
the distribution of the observed data, we further 
define Z to follow the marginal distribution given 
by P(Z = 1) = 0.3 and P(Z = 2) = 0.2, and P(X = 
1 |Z = z) = pio + 0.15 x z for z = 0,1,2. To define the 
joint distribution of the observed and potential out¬ 
comes, we set the expected treatment-free outcome 
in the population to be cco = E{Yq) = 0.19, which 
leads to Oq = logU(lo) = —1.6607 in moment con¬ 
ditions (18) and (19), and E(Y) = 0.25, ft =0.15, 
ft = 0.6 and ft = —0.6. The other parameter val¬ 
ues are then numerically found in order for CMI 
and NEM to hold: ft = —1.6976, ft = —0.3186, 
ft = 0.2511 and p w = 0.2321. 

Table 1 presents some estimation results for 
10,000 samples of size 10,000 drawn from popula¬ 
tion model M\. Three different versions of the GMM 
estimator are applied: the first column of Table 1 
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Table 1 

Monte Carlo estimation results for multiplicative SMM 



Single instrument 

Multiple instruments 

Instruments S 

1,Z 

1, Z\ 


Moment conditions 

(18) or (19) 

(18) 

(19) 

One-step GMM 




<% 

-1.6614 

-1.6628 - 

-1.6599 


(0.0839) 

(0.0561) 

(0.0561) 


[0.0843] 

[0.0566] 

[0.0565] 

ipo 

0.6151 

0.6102 

0.6033 


(0.2175) 

(0.1358) 

(0.1353) 


[0.2168] 

[0.1361] 

[0.1356] 

Two-step GMM 




_ * 
a 0 


-1.6629 

-1.6598 



(0.0561) 

(0.0561) 



[0.0565] 

[0.0565] 

i>o 


0.6095 

0.6024 



(0.1355) 

(0.1350) 



[0.1359] 

[0.1353] 

Hansen J 


0.9806 

0.9793 

Rej. freq. 5% 


0.0478 

0.0475 

Notes: Sample size 

10,000; means based on 10,000 Monte 


Carlo replications; std. error in brackets; means of estimated 
standard errors in square brackets; data drawn from popula¬ 
tion model Mi as described in Section 6.1; aj = —1.6607 and 

= 0 . 6 . 

contains the results of the just-identified model us¬ 
ing the multivalued instrument Z G {0,1,2} as a 
single instrument so that S = (1,Z)'; in the second 
and third columns, we present the one- and two- 
step GMM estimates for moment conditions (18) 
and (19), respectively, using multiple instruments 
so that S = (1, Zi, Z 2 )'. 

All of the estimators display a small positive 
bias for ipo = 0 . 6 , and the mean estimated stan¬ 
dard errors are very close to the true standard er¬ 
rors. Among the two estimators using multiple in¬ 
struments, this bias is slightly larger for the esti¬ 
mator based on moment condition (18). There is 
here a negligible gain in precision from using the 
two-step GMM estimator as compared to the one- 
step estimator. However, there is a substantial gain 
in efficiency from using two instrumental variables 
rather than one, with the standard error decreas¬ 
ing from 0.22 for the just-identified model to 0.14 
for the two-step GMM estimators. This is because 
the GMM projection (23) in this case is not lin¬ 
ear in Z, even though the conditional probabilities 
P(X = 1| Z) are. More specifically, the coefficient on 
Z 2 in the regression of YX on (l,Zi,Z 2 ) from (23) 


is actually smaller than that of Z\. Under this par¬ 
ticular population model (but not generally) the re¬ 
lationship between the coefficients is roughly linear: 
the average coefficient on Z\ is equal to 0.1067 and 
for Z 2 it equals 0.0557. Hence, a single instrument 
that takes the value 1 if Z = 2 and 2 if Z = 1 leads 
to a just-identified estimator which is likely to be 
almost as efficient as the over-identified GMM es¬ 
timators. Further simulations show that this is in¬ 
deed the case, with the just-identified estimator for 
ipo just described having an average of 0.6077 and 
a standard error of 0.1375, which are both virtu¬ 
ally identical to those of the over-identified GMM 
estimators. 

We repeated the analysis above for a similar de¬ 
sign to but with the instrument Z taking the six 
values 0,1,..., 5; full details of this design are avail¬ 
able from the authors. The GMM estimators are 
again well behaved. Using moment conditions (19), 
the mean based on 10,000 Monte Carlo estimates 
using the two-step GMM estimator is 0.5966 with a 
standard error 0.0801; the mean estimated standard 
error equals 0.0806. The rejection frequency of the 
J-test is 5.1% at the 5% level. 

Returning to the design with Z taking the val¬ 
ues 0,1,2, we modify population model M\ so as to 
study how the multiplicative GMM performs when 
Z does not satisfy the key conditions of an instru¬ 
mental variable. We do this by keeping all M\ pa¬ 
rameters the same but making the “instrument” Z\ 
invalid. This is done by specifying 

E(Y\X,Zi,Z 2 ) 

= exp{/3 0 + (/3i + ^ 0 )X + (/3 2 + 4>)Zi 
+ faZ 2 + faXZ 1 + p b XZ 2 }, 

with cj) = 0.15. In this case, the CMI assumption is 
violated as E[Yq\Z = 0] = E[Yq\Z = 2] = 0.19 as be¬ 
fore, but now E\Yq\Z = 1] = 0.2207. The GMM esti¬ 
mators are now severely biased upwards. The mean 
based on 10,000 Monte Carlo estimates of the two- 
step GMM estimator using moments (19) is equal to 
1.1191, with a standard error of 0.1681. The mean 
(variance) of Hansen’s J-test is equal to 3.56 (3.70) 
with a rejection frequency at the 5% level of 34%. If 
instead we change the coefficient on Z 2 to /J 3 + 0.15, 
we get a much smaller bias, with the mean (std. er¬ 
ror) of the estimator equal to 0.6452 (0.1370), but 
the rejection frequency of the J-test is now much 
larger, namely, 93% at the 5% level. This difference 
is due to the fact that, as highlighted above, in this 
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case Z\ is a stronger instrument than Z 2 , in the 
sense the SMM estimator is more precise using Z\ 
than when using Z 2 as an instrument. For exam¬ 
ple, in the original design where both instruments 
are valid, using only Z\ as an instrument resulted 
in the median of the 10,000 estimates to be equal to 
0.6009 with the interquartile range equal to 0.1967, 
whereas using only Z\ as an instrument resulted in 
a median of 0.6242, with a much larger interquartile 
range of 1.5253. If the bias is due to a violation of 
the CMI assumption for Z\, the estimator based on 
Z 2 does not have enough precision to reject the null 
that both moment conditions are valid as frequently 
as for when Z 2 is invalid, as the estimator based on 
Z\ is more precise and the test has more power. 

6.2 Logistic SMM 

To investigate the performance of the GMM esti¬ 
mators for the logistic SMM, we generate data from 
population M 2 satisfying the logistic SMM model 
and its corresponding NEM and CMI identification 
restrictions. More specifically, the data are gener¬ 
ated from 

E(Y\X, Z U Z 2 ) = expit{A) + (ft + ^ 0 )X + foZi 

+ p 3 Z 2 + p4XZ 1 + p 5 XZ 2 }, 

where the treatment effect is again = 0.6. Simi¬ 
larly to model M\. we set P(Z = 1) = 0.3, P(Z = 
2) = 0.2, P(X = l\Z = z)=p w + 0.15 x 2 , E(Y 0 ) = 
0.19, E(Y) = 0.25, = 0.15, 0 4 = -0.6 and 0 5 = 

0.6. The other parameters are such that CMI and 
NEM hold: 0 O = -1-518, 0 2 = 0.3183, 0 3 = -0.5202, 
and pio = 0.4404. 

Table 2 contains estimation results for 10,000 sam¬ 
ples of size 10,000 drawn from population model M 2 - 
Three different versions of the GMM estimator for 
the logistic SMM are applied: the first column of Ta¬ 
ble 2 contains the results of the just-identified model 
using multivalued Z as a single instrument; in the 
second column, we present the one- and two-step 
GMM estimates for the 2SGMM using multiple in¬ 
struments; and the third column contains the corre¬ 
sponding results for the joint-GMM estimator based 
on (22). Both the 2SGMM and joint-GMM estima¬ 
tors use saturated logistic models for (3 as in (20). 

All of the estimators are virtually unbiased and 
the means of the estimated standard errors are close 
to Monte Carlo standard errors. There is an effi¬ 
ciency gain from using the instruments separately: 
the standard error in the just-identified case is 


Table 2 

Monte Carlo estimation results for logistic SMM 


Single instrument 

Multiple instruments 

Instruments S 

i,z 

1 , Z \, Z2 

1 , Z \, Z2 

Moment conditions 

Joint/2SGMM 

2SGMM 

Joint-GMM 

One-step GMM 

ao 

0.1912 

0.1905 

0.1907 


(0.0168) 

(0.0153) 

(0.0153) 


[0.0167] 

[0.0152] 

[0.0152] 

^0 

0.5970 

0.6033 

0.6001 


(0.1905) 

(0.1729) 

(0.1731) 


[0.1899] 

[0.1722] 

[0.1721] 

Two-step GMM 

Q?0 


0.1904 

0.1911 

ipo 


(0.0153) 

[0.0152] 

0.6038 

(0.0154) 

[0.0152] 

0.5957 

Hansen J 


(0.1729) 

[0.1722] 

0.9882 

(0.1735) 

[0.1722] 

0.9827 

Rej. freq. 5% 


0.0503 

0.0495 


Notes: Sample size 10,000; means based on 10,000 Monte 
Carlo replications; std. [error] in brackets; means of estimated 
standard errors in square brackets; data drawn from popula¬ 
tion model M 2 as described in Section 6.2; ao = 0.19 and 
V’o = 0.6. 

0.1905, compared to 0.1729 for the 2SGMM estima¬ 
tor. The performances of the 2SGMM estimator and 
the GMM estimator using the joint moment condi¬ 
tions are virtually identical. The Hansen J-tests are 
well behaved in both cases. There is no efficiency 
gain from using the two-step GMM estimators as 
compared to the one-step estimators in this design. 

As with the multiplicative SMM, we also find that 
the estimators behave well for instruments with 6 or 
even 11 values, although we find that the 2SGMM 
estimator has a small upward finite sample bias in 
the designs we considered. For example, for an in¬ 
strument with values 0 , 1 , 2 ,..., 10 , we get means 
(std. error) of the two-step GMM estimates of 0.6323 
(0.1073) for 2SGMM and 0.5999 (0.1066) for the 
joint moments GMM estimator. Details of this de¬ 
sign are available from the authors. 

Finally, we return to the design with Z taking the 
values 0,1,2, and modify population model M 2 so 
as to study how these estimators perform when Z 
is not a valid instrumental variable. We keep all pa¬ 
rameters the same but make the “instrument” Z 2 in¬ 
valid, by changing the parameter of Z 2 to 0 3 +r with 
r = 0.25. The GMM estimators are now severely bi- 
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ased upwards. The mean of 10,000 Monte Carlo es¬ 
timates of the two-step GMM estimator using the 
joint moments (22) is equal to 1.2805, with a stan¬ 
dard error of 0.1511. However, in this case the mean 
(variance) of Hansen’s J-test is equal to 1.26 (3.09), 
with a rejection frequency at the 5% level of only 
8.5%. In contrast, if we instead change the parame¬ 
ter of Z\ to /?2 + t with r = 0.1, the estimator has a 
much smaller bias, with a mean of 0.5527 and stan¬ 
dard error of 0.1660, but the J-test has much more 
power in this case as it rejects 49.4% of the time at 
the 5% level. This is explained by the fact that here 
Z 2 is a stronger instrument than Z\. 


7. LOCAL AVERAGE TREATMENT EFFECTS 


The parameters of the SMMs we have considered 
thus far are all identified by the assumption of no 
effect modification by the instruments (NEM). For 
the case where we have two instruments Z\ and Z 2 , 
recall that the NEM assumption for the identifica¬ 
tion of the conditional causal relative risk is that 


E(Y\X,Z 1 ,Z 2 ) 

E(Y 0 \X,Z 1 i Z 2 ) 


exp(^oX), 


that is, the instruments Z\ and Z 2 do not modify the 
causal effect of X on the risk. In this section, we con¬ 
sider how the failure of NEM impacts on GMM es¬ 
timators for additive and multiplicative SMMs with 
multiple instruments. 

Clarke and Windmeijer (2010) review identifica¬ 
tion results concerning the additive and multiplica¬ 
tive SMMs in the simple case of a single binary in¬ 
strument where both X and Y are also binary. If 
the NEM assumption fails, then a causal effect is 
identified if the instrument Z has causal effect on 
treatment X and selection is “monotonic”. In this 
simple case, where Z is randomised treatment as¬ 
signment and X is the selected treatment, selection 
is monotonic if 


P(x 1 -X 0 > 0 ) = 1, 

that is, subjects cannot defy their treatment assign¬ 
ments in every potential scenario, so that {X\ = 0, 
Xq = 1} has zero probability. Under monotonic¬ 
ity, the additive SMM estimator (13) identifies the 
“local average treatment effect” (LATE), and the 
multiplicative SMM identifies the “local risk ratio” 
(LRR), where 

LATE = E(Y X - Y 0 \X x > X 0 ); 

E(Y\\Xi > A 0 ) 

E{Yq\X\ > X 0 )' 


LATE is the average treatment effect for the sub¬ 
group of subjects who actually and counterfactu- 
ally accept the treatments to which they have been 
assigned, that is, X\ = 1 and Xq = 0; for this rea¬ 
son, these subjects are also known as “compilers” 
and LATE is also known as the “compiler average 
causal effect” (CACE). The logistic SMM does not 
estimate a local causal effect when NEM fails, but 
for binary outcomes the local odds ratio can be es¬ 
timated by taking the ratio of LRR estimates ob¬ 
tained by fitting multiplicative SMMs to binary Y 
and 1 — Y. 

If we have two instruments, then these instru¬ 
ments could in principle define two different local 
causal effects, provided that the two instruments 
can be combined into a single multivalued instru¬ 
ment. We consider using the single A'-valued in¬ 
strument Z € {0, 1, 2,..., K — 1} for binary X. In 
this scenario, monotonic selection does not have the 
convenient “no defiers” interpretation; instead, se¬ 
lection is monotonic if z >z implies that X z > X z 
with probability 1, for any two values z ^z of the 
instrument. From this, we can define the analogue 
of (13) for z>z as 

E(Y\Z = z)-E(Y\Z = z) 

Pz ’ z E[X\Z = z) - E{X\Z = z)’ 

where f3 Z)Z = E(Y\ — Yq\X z > X~) = LATE- j- under 
monotonicity. 

The 2SLS estimator for the additive SMM is ob¬ 
tained as the OLS estimator from the regression of Y 
on X , where X is the prediction from the first-stage 
regression of X on S = {1, Z x ,..., Zk~i}' and Z k = 
I(Z = k). Let monotonicity hold and the values of Z 
be ordered such that E(X\Z = k) > E(X\Z = k— 1). 
Irnbens and Angrist (1994) show that the 2SLS es¬ 
timator is consistent for 
K -1 

Pz — ' k'kfik,k—h 

k =1 


where 


ji k = {E{X\Z = k) — E(X\Z = k- 1)} 

J2l- k 1 {E(X\Z = l)-E(X)}7T l 
Eflo 1 E{X\Z = 1){E(X\Z = l ) - E(X)}7n ’ 

and 7 ti = P(Z = l ) such that 0 < < 1 and 

Hk = 1; see also Angrist and Irnbens (1995) 
and Angrist and Pischke (2009). In other words, 
when NEM fails but selection is monotonic, the 
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2SLS estimator is not consistent for E{Y\ —Yq\X = 
1 ), but for a weighted sum of local average treatment 
effects. 

Alternatively, if we define 

_ E(Y\Z = k)-E(Y\Z = 0) 

Pkfi E(X\Z = k) — E(X\Z = 0) ’ 

then, following the proof given by Angrist and Im- 
bens (1995), it is easily established that 


where 


K -1 

Pz = ^ ^kPk, 0; 
k= 1 


A fc = {E{X\Z = k) - E(X\Z = 0)} 

{E(X\Z = k)-E(X)}n k 
' eEo 1 E(X\Z = 1){E(X\Z = l ) - E(X)}m ’ 

such that E/lT* Afc = 1. However, in this case, P z is 
only a weighted average of the p k ,o (be., 0 < \ k < 1 ) 
if E(X\Z = 1) > E(X). 

We now extend this result to the multiplicative 
SMM and give an analogous result for local risk ra¬ 
tios. In Section 4.3 we established that the one-step 
GMM estimator for exp(— ipo) using moment condi¬ 
tion (17) was equivalent to a linear 2SLS estimator 
because 


Y exp(—Xi/’o) - «o 

(28) 

= Y( 1 - X) + YX exp(—^ 0 ) - ao. 


We can therefore straightforwardly generalise the 
above results of Imbens and Angrist (1994) for the 
additive SMM to the multiplicative SMM for the 
inverse local risk ratio. As above, let 

e~ p 
c k,k -1 

(29) E{Y(X - 1)1 Z = k}~ E{Y(X -l)\Z = k-l} 

~ E(YX\Z = k)- E{YX\Z = k - 1) ’ 

where 


e k,k -1 


E(Y 0 \X k >X k -i) 
E(Y 1 \X k >X k ^) 




is the inverse local risk ratio under monotonicity; 
see Angrist (2001). We then get equivalent results 
to the above for the linear SMM, namely, the 2SLS 
estimator for exp(— ipo) in (28) is a consistent esti¬ 
mator of 


K -1 

e z p =^ e kH-v 


k= 1 


where 

H k = {E(YX\Z = k)~ E(YX\Z = k- 1)} 

EfL k 1 {E(YX\Z = l)-E(YX)}n l 
EfLo 1 E(YX\Z = 1){E(YX\Z = 1)- E(YX)}m ’ 

and so e z ^ is a weighted average of inverse local 
risk ratios if E(YX\Z = k) > E(YX\Z = k- 1). As 
in Angrist and Imbens (1995), the weights fi k are 
proportional to E(YX\Z = k) — E(YX\Z = k — 1), 
and hence the stronger the instrument, that is, the 
bigger the impact of the instrument on the regressor 
YX in (28), the more weight (29) receives in the 
linear combination. The second component of the 
weighting gives more weight to the estimates (29) 
when the values of Z are closer to the center of the 
distribution of Z (see Angrist and Imbens (1995), 
pages 437). 

For the local risk ratio, we use the results from 
Section 4.3 that the one-step GMM estimator for 
exp(^o) can be obtained from a linear IV estimator 
in the additive SMM with YX as the “outcome” and 
Y(X — 1) as the “treatment”, but with instruments 
a constant and E(YX\S). Let 

pP 

1 

E(YX\Z = k) - E(YX\Z = k- 1) 

“ E{Y(X -1)| Z = k}- E{Y(X - 1)\Z = k-l}' 

where e{ k _ x = £(Vi|X fc > X k ^)/E{Y 0 \X k > 
X k _i ) = LRRfc fc_i under monotonicity. It follows 
that the multiplicative SMM estimator is consistent 
for 

4=J2 Tke lk- 1. 

fc=i 

where 

r fc = {E(Y(X - 1 )|Z = k)~ E(Y(X - 1 )|Z = k- 1)} 

_ Efi^iEjYXlZ = Q - E(YX)}n t 

E^o 1 E i Y ( X ~ V\ Z = 1}{E(YX\Z = l) - E(YX)} n ’ 

and hence e z is a weighted average of local risk ratios 
if E(YX\Z = k)> E(YX\Z = k - 1) and E{Y(X - 
1 )|Z = k} > E{Y(X -1 )|Z = k- 1}. 

As an example, consider an instrument that takes 
the values Z = {0,1,2,3}, with Y and X generated 
from a bivariate normal distribution as 

X = J(co + c\Z\ + C 2 Z 2 + C 3 Z 3 — V >0), 

Y = I(b 0 + b 1 X -U> 0), 
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Table 3 

Risk ratio estimation results 



e l,0 

e * 3 

e 2,l 

pP 

e 3,2 

g/ 3 

Tl 

r 2 

t 3 

Mean 

1.1644 

1.3304 

1.5415 

1.3113 

0.3726 

0.3995 

0.2279 

St. dev. 

0.0946 

0.1213 

0.1601 

0.0377 

0.0268 

0.0321 

0.0216 


Notes: Estimation results from 10,000 Monte Carlo replica¬ 
tions. Sample size 40,000. 


with, as before, Zy. = I(Z = k). Setting tti = P(Z = 
l ) = 0.25 for all l, the q parameters are such 
that P(X = 1\Z = l) = 0.1 + 0.1 x l, bo = <h” 1 (0.4), 
by = 0.5 and p = 0.8. The local risk ratios in this 
population are LRRi o = 1.1585, LRR 2j i = 1.3227 
and LRR 3 2 = 1.5303; the population T-weights are 

n = 0.3725, t 2 = 0.3991, r 3 = 0.2285. 

Clarke and Windmeijer (2010) show that the NEM 
assumption does not hold under this design. How¬ 
ever, the instruments are monotonic and so the 
one-step GMM estimator based on moment condi¬ 
tions (17) identifies the weighted average T 1 LRR 10 + 
t 2 LRR 2> i + t 3 LRR 32 = 1.3090. Table 3 presents 
some estimation results confirming this, for a sample 
of size 40,000 and for 10,000 Monte Carlo replica¬ 
tions. Using the two-step GMM results, the Hansen 
J-test rejects the null 47% of the time at the 5% 
level, therefore clearly having power to reject this 
violation of the NEM assumption. 

8. THE EFFECT OF ADIPOSITY ON 
HYPERTENSION 

8.1 Binary Exposure 

Timpson et al. (2009) used multiple genetic in¬ 
struments to estimate the causal effect of adipos¬ 
ity on hypertension from the Copenhagen General 
Population Study; full details of the variable defi¬ 
nitions and selection criteria are given in that pa¬ 
per. We apply the procedures described above to 
reanalyse these data using additive, multiplicative 
and logistic SMMs, using the same genetic markers 
as instruments for adiposity. Furthermore, our sam¬ 
ple includes additional individuals who have been 
recruited into the study since the previous study 
was published; the total number of individuals in 
our analyses is 55,523. 

The binary outcome variable is an indicator of 
whether an individual has hypertension, which is 
defined as a systolic blood pressure of >140 mmHg, 
diastolic blood pressure of >90 mmHg, or the taking 


Table 4 

Combinations of instruments 


FTO 

MC4R 

z 

Freq. 

0 

0 

0 

0.20 

0 

1 

1 

0.15 

1 

0 

1 

0.27 

1 

1 

2 

0.21 

2 

0 

2 

0.09 

2 

1 

3 

0.07 


of antihypertensive drugs. The intermediate adipos¬ 
ity phenotype is being overweight, defined as having 
a BMI > 25. The two Single Nucleotide Polymor¬ 
phisms (SNPs) that were used as instruments by 
Timpson et al. (2009) and that have been consis¬ 
tently shown to relate to BMI and adiposity are the 
FTO (rs9939609) and MCJyR (rsl7782313) loci; see 
Frayling et al. (2007) and Loos et al. (2008). Lawlor 
et al. (2008) provide further details on the use of 
genes as instruments in Mendelian randomisation 
studies. 

FTO is specified as having three categories: no 
risk alleles (homozygous TT), one risk allele (het¬ 
erozygous AT) and two risk alleles (homozygous 
AA). Due to the nature of the association between 
MC4R and adiposity (a dominant genetic model), 
MC 4 R is specified as having two categories: no risk 
alleles (TT) versus one or two risk alleles (CT or 
CC). Combining the two instruments together re¬ 
sults in an instrument with 6 different values, but 
we found that two pairs of combinations of alleles 
gave the same predicted value of being overweight; 
this is also true for the projection in the multiplica¬ 
tive SMM. We therefore condensed the number of 
values of the instrument to four. The combinations 
for the four values are given in Table 4. Table 5 gives 
the frequency distributions for the hypertension (Y) 
and overweight (X) variables. 


Table 5 

Frequency distributions for the hypertension (Yj and 
overweight ( X ) variables 



All 

Z 

= 0 

Z 

= 1 

Z 

= 2 

Z 

= 3 


X 

X 

X 

X 

X 

Y 

0 1 

0 

1 

0 

1 

0 

1 

0 

1 

0 

0.18 0.12 

0.19 

0.12 

0.19 

0.12 

0.17 

0.13 

0.16 

0.13 

1 

0.25 0.44 

0.27 

0.42 

0.26 

0.43 

0.23 

0.46 

0.23 

0.48 
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Table 6 

SMM estimation results of the effect of being overweight on hypertension 


Additive 

OLS 

2SLS 

GMM2 

J-test 

V’o 

0.2009 

0.2091 

0.2095 

0.2956 


[0.1932; 0.2087] 

[0.0485; 0.3697] 

[0.0489; 0.3701] 


Multiplicative 

Gamma 

GMM1 

GMM2 

J-test 

exp (tp 0 ) 

1.3464 

1.3621 

1.3640 

0.3071 


[1.3300; 1.3630] 

[1.0784; 1.7204] 

[1.0798; 1.7231] 


Logistic 

Logistic regression 

GMM1 

GMM2 

J-test 

exp(t/) 0 ) 

2.5823 

2.8317 

2.8656 

0.2924 


[2.4885; 2.6797] 

[1.2382; 6.4759] 

[1.2538; 6.5489] 



Notes: Sample size 55,523. Gamma regression uses log link; multiplicative SMM uses moments (17); logistic SMM uses joint 
moments (22); instruments, S = {1, Z \, % 2 , Z 3 }; 95% CIs in brackets; p-values are reported for the J-test. 


The estimation results for the linear, multiplica¬ 
tive and logistic SMM estimators are presented in 
Table 6. The instrument set for the GMM estimators 
is S = (1, Z\, Z 2 , Z 3 )'. For the linear SMM, the 2SLS 
and two-step GMM estimates are virtually identical 
to the OLS estimate. As the F-statistic in the re¬ 
gression of overweight on S is equal to 113, this is 
not due to a weak instrument problem. The OLS es¬ 
timate of the risk difference is quite large and equal 
to 0.20 (95% Cl 0.19; 0.21). The two-step GMM es¬ 
timate is almost the same and equal to 0.21 (95% Cl 
0.05-0.37), but clearly the 95% confidence interval 
is much wider for the two-step GMM estimate than 
it is for OLS. The J-test does not reject the null of 
the validity of the model assumptions, including the 
NEM assumption, and therefore these results indi¬ 
cate that there may not be much confounding bias 
in the OLS results. We find similar results for the 
multiplicative and logistic SMMs. The GMM esti¬ 
mates are virtually identical to the Gamma and the 
logistic regression estimates, respectively, and all es¬ 
timates indicate that being overweight leads to hy¬ 
pertension. The Gamma estimate for the risk ratio 
is equal to 1.35 (95% Cl, 1.33-1.36), whereas the 
two-step GMM estimate is equal to 1.36 (95% Cl 
1.08-1.72). We present and compare the multiplica¬ 
tive SMM results to that of the Gamma generalised 
linear model with a log link here, because moment 
conditions (17)-(19) when using X as an instrument 
for itself are equivalent to the first-order condition of 
the Gamma with log link GLM. The logistic regres¬ 
sion odds ratio is equal to 2.58 (95% Cl, 2.49-2.68) 
and the two-step GMM estimate is equal to 2.87 
(95% Cl 1.25-6.55). All estimation results indicate 
a large causal effect of adiposity on hypertension. 


8.2 Continuous Exposure 


Following Vansteelandt and Goetghebeur (2003), 
we can use the same GMM format to estimate the 
logistic SMM with a continuous exposure X. With a 
continuous exposure, parametric modelling assump¬ 
tions have to be made in order to identify causal 
parameters. As in Vansteelandt and Goetghebeur 
(2003) and Vansteelandt et al. (2011), we impose 
that the exposure effect is linear in the exposure on 
the log-odds ratio scale and independent of the in¬ 
strumental variable: 


odds(V = 1\X, Z) 
odds(lo = ljV) Z) 


exp (£ 0 JO, 


where odds(V = 1| X,Z) = P(Y = 1| X,Z)/P(Y = 
0| X,Z). Further, we specify the association model 
as 


logit{P(V = 1\X, Z)} = logit { mj 3 (X, Z 1 ,Z 2 ,Z 3 )} 

= A) + PiX + f3 2 Zi + P 3 Z 2 

+ faz 3 + hxz 1 + faxz 2 

+ frXZ 3 , 

and estimate the parameters using the joint moment 
conditions as in (22). 

For the continuous exposure we use (BMI — 
UMT), 10(In BMI - In BMI) and 10(ln RELBMI). 
where In BMI is the natural logarithm of BMI , and 
In RELBMI are the residuals of the regression of 
In BMI on sex, age, age squared, ln(height) and an 
age-sex interaction, as used in Timpson et al. (2009) 
to represent relative BMI. We subtract the mean 
from BMI and In BMI to ensure that zero exposure 
is part of the data range. We further multiply the 
In BMI and In RELBMI by a factor 10 so that the 
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Table 7 

Estimation results for double-logistic SMM with continuous 
exposure 


Exposure 

BMI 

In BMI 

In RELBMI 

expRo) 

1.1187 

[1.0984; 1.6705] 

1.3546 

[1.0984; 1.6705] 

1.3337 

[1.0929; 1.6276] 

J-test 

0.4714 

0.4828 

0.5004 


Notes: Sample size 55,523. Two-step GMM estimates, using 
joint moments (22). Instruments, S = {1, Z\, Z 2 , Z 3 }. BMI 
and In BMI taken in deviation from the mean. In BMI and 
In RELBMI multiplied by a factor 10. 95% CIs in brackets; 
p-values are reported for the J-test. 


estimated odds ratio is for an increase in exposure 
of approximately 10%. 

Table 7 presents the two-step estimation results 
for three separate models for the three exposure 
measures. Again, we find a strong positive effect 
of adiposity on hypertension. The estimate of the 
odds ratio for a one-unit increase in BMI is equal 
to 1.12 (95% Cl 1.10; 1.67), whereas the estimates 
for the odds ratios for a 10% increase in In BMI 
or In RELBMI are 1.35 (95% Cl 1.10-1.67) and 
1.33 (95% Cl 1.09-1.63), respectively, the latter two 
therefore virtually identical. Also, for these logistic 
SMM models with continuous exposures, the J-test 
results do not indicate a problem with the model 
assumptions. 


9. DISCUSSION 

We have shown how the conditional moment con¬ 
ditions that identify additive, multiplicative and lo¬ 
gistic SMMs can be used to derive a standard GMM 
estimator of the type widely used in econometrics. 
The key to this formulation is simply to treat the 
expected exposure-free potential outcome E(Yq) as 
a parameter. For simple SMMs without continuous 
baseline covariates, these estimators are semipara- 
metrically efficient if the identifying instrumental 
variables are orthogonal binary variables. In these 
cases, the estimator combines the instruments op¬ 
timally in the manner proposed by Bowden and 
Vansteelandt (2011). Another major advantage is 
that standard GMM routines are available in statis¬ 
tical software packages. We provide example Stata 
and R syntax in the Appendix for use by applied 
researchers. These estimation routines provide cor¬ 
rect asymptotic inference, even for the logistic SMM, 
when the two sets of model parameters are esti¬ 
mated jointly, and a simple test for the validity of 


the SMM moment conditions. We used Monte Carlo 
studies to show that the Hansen J-test can have 
power to detect violations of the CMI and NEM as¬ 
sumptions. Moreover, if the NEM assumption fails 
and selection is monotonic, then we have shown that 
the one-step GMM estimator for the multiplicative 
SMM is consistent for a weighted average of the 
instrument-specific local risk ratios. 

A characteristic of all estimating equations for 
SMMs is that the analyst must specify and estimate 
auxiliary models further to the SMM. Extending the 
discussion in Section 2.3 to multiple instrumental 
variables, the estimating equations for G-estimation 
depend on E(Zj) = ftj, which must be replaced in 
the estimating equation by a consistent estimator 
Ji r To derive the correct asymptotic distribution, 
the moment conditions for jlj must be included in 
the system of moment conditions. For the multi¬ 
plicative SMM with multiple instruments discussed 
in Section 4, the extended set of moment conditions 
is 



( E{Z\ — fii) \ 


(°\ 

(30) 

E(Z 2 — H 2 ) 


0 

E{(Z! - ft^Yex p(-^o*)} 


0 


\E{(Z 2 - ft 2 )Yex 


w 


The extended moment conditions can easily be in¬ 
corporated in the Stata and R GMM estimation rou¬ 
tines, and we include in the Appendix code that 
does this for the additive, multiplicative and logis¬ 
tic SMMs. 

There are two relative weaknesses of our ap¬ 
proach in applications where covariates C are re¬ 
quired for identification, in other words, where 
CMI only holds covariate conditionally such that 
E(Y 0 \Z,C) = E(Y 0 \C) but E(Y 0 \Z) + E(Y 0 ). To 
discuss these weaknesses, consider a multiplicative 
SMM which does not depend on C but where covari¬ 
ates are still required for identification. In terms of 
a GMM estimator, the unconditional moment con¬ 
ditions [equivalent to (17) in Section 4.2] are 


(31) E 


Y 

exp(V’oA’) 


E(y 0 |C)| 


= 0, 


which can be seen to depend on the extended in¬ 
strument (S', C')' and E(Yq\C) as well as the SMM 
itself. 

The first weakness is that the efficiency result for 
two-step GMM discussed above does not hold if C 
includes continuous covariates or if the resulting ex- 
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tended instrument cannot otherwise be represented 
by a set of the mutually orthogonal binary variables. 
In such scenarios, the two-step GMM estimator is 
only locally efficient given the unconditional mo¬ 
ments, which here are (17). Newey (1993) discusses 
different approaches to improve efficiency, for exam¬ 
ple, using a power-series expansions of the instru¬ 
ments. 

The second weakness is that consistency of the 
GMM estimator now depends on the model for 
E(Yq\C) being correctly specified. By definition, this 
model cannot be empirically tested for misspecifi- 
cation because it is determined by the SMM; but 
the consequence of misspecifying it is an inconsis¬ 
tent GMM estimator. In contrast, the G-estinrators 
and the double-logistic SMM estimator discussed in 
Section 2 require only that E(Z |C) is correctly spec¬ 
ified, which can be empirically tested for misspecifi- 
cation. Likewise, the doubly robust estimating equa¬ 
tions proposed by Tan (2010) depend on covariate- 
conditional models for Z, X given Z, and Y given 
X and Z, all of which can be tested for misspec- 
ification. The doubly robust property is attractive 
in theory, but these estimators are not available in 
standard software, and further work is required to 
explore fully, rather than locally, efficient choices of 
weights for the estimating equations. Further work 
on the GMM estimators proposed here with contin¬ 
uous covariates might investigate the bias and effi¬ 
ciency of GMM estimators, both asympotically and 
in finite samples, compared to existing estimators 
for SMMs; see Okui et al. ( 2012 ). 

APPENDIX: STATA AND R SYNTAX 

In this section we present example Stata (version 
11) and R (version 2.13.1) syntax to fit SMMs us¬ 
ing generalised method of moments routines. Our 
example code uses the notation of Y the outcome, 
X the exposure and two instrumental variables, Z\. 
Z 2 , in addition to the constant vector of l’s. Both 
syntaxes easily generalise to more instruments and 
allow different association models in the double lo¬ 
gistic SMM. 

In both Stata and R it is possible to specify ana¬ 
lytic first derivatives, which we find greatly reduces 
the time for the models to fit. Also, both syntaxes al¬ 
low the inclusion of covariates. We have not included 
these extra syntaxes here but they are available on 
request. 


Stata Syntax 

The Stata syntax uses the gmm command; and 
{eyO} denotes E(Yq) the mean exposure free po¬ 
tential outcome. After fitting each SMM using 
two-step estimation we perform the Hansen over¬ 
identification test using the estat overid post¬ 
estimation command. The gmm command automati¬ 
cally includes a vector of l’s as instruments to allow 
estimation of the constant [E(Yq)] term, hence, we 
just need to list zl and z2 in the instruments() 
option. 

Additive SMM Here {psi} denotes the causal ef¬ 
fect (which is a risk difference for a binary outcome). 

gmm (y - {eyO} - x*{psi}), 
instruments(zl z2) 
estat overid 

This is equivalent to Stata’s built in ivregress 
command. 

ivregress gmm y (x = zl z2) 
estat overid 

Multiplicative SMM Here {psi} denotes the log 
causal risk ratio, and hence we display the exponen¬ 
tiated estimate using the lincom command with its 
eform option after fitting the model. 

gmm (y*exp(-l*x*{psi}) - {eyO}), 
instruments(zl z2) 
lincom [psi]_cons, eform // 
causal risk ratio 
estat overid 

We also give the Stata syntax for the alternative 
Multiplicative SMM moments. Here {logeyO} de¬ 
notes log{£(Yb)} and so we additionally display the 
exponentiated form of this parameter after fitting 
the model. 

gmm (y*exp(-x*{psi} - {logeyO}) - 1) , 
instruments(zl z2) 
lincom [psi]_cons, eform // 
causal risk ratio 

lincom [logeyO]_cons, eform // E[Y(0)] 
estat overid 

Expanded moments for multiplicative SMM 

gmm (zl-{mul>) III 
(z2—[mu2}-) III 

((zl-{mul})*(y*exp(-l*x*{psi}))) III 

( (z2-{mu2})*(y*exp(-l*x*{psi}))) , III 

winitial(identity) 

lincom [psi]_cons, eform // 

causal risk ratio 

estat overid 
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Logistic SMM Here {psi} denotes the log causal 
odds ratio. In the joint estimation we use the gmm 
command’s linear predictor substitution syntax (we 
denote the linear predictor for the association model 
by {xb:}). We collect the association and causal 
model parameter estimates in a matrix called from; 
we then use these estimates as initial values in the 
joint estimation. Also, in the joint estimation we 
specify the winitial(unadjusted, independent) 
option so that the moments are assumed to be in¬ 
dependent in the first step of estimation. Note in 
Stata, invlogit (x) = expit(x) = e x /(I + e x ). 

* generate interactions 
gen xzl = x*zl 

gen xz2 = x*z2 

* association model 
logit y x zl z2 xzl xz2 
matrix from = e(b) 
predict xblog, xb 

* causal model with incorrect SEs 

gmm (invlogit(xblog - x*{psi}) - {eyO}), 
instruments(zl z2) 
matrix from = (from,e(b)) 

* joint estimation of association and 
causal models 

gmm (y - invlogit({xb:x zl z2 xzl xz2} 

+ {bO})) III 

(invlogit({xb:} + {bO} - x*{psi}) 

- {eyO» , III 

instruments(1:x zl z2 xzl xz2) III 
instruments(2:zl z2) III 

winitial(unadjusted, independent) from(from) 
lincom [psi]_cons, eform // 
causal odds ratio 
estat overid 

R syntax 

The R syntax uses the gmm () function in the GMM 
package (Chausse (2010)), which we first load us¬ 
ing library (gmm). After fitting each SMM using 
two-step estimation we perform the Hansen over¬ 
identification test using the specTestO function. 
The R code assumes our data is in a matrix called 
data whose columns contain the values of the vari¬ 
ables Y, X, Z\ and Z^ in this order with column 
names "y", "x", "zl", "z 2 ". 

In this code we have specified the vcov="iid" op¬ 
tion which assumes the moment conditions are inde¬ 
pendent. We find specifying this option is necessary 
for the models to converge on reasonably sized data 
sets. We also find that changing the optimization al¬ 
gorithm used in the estimation through the method 


option can reduce the time it takes the models to fit 
(we find the BFGS and L-BFGS-B methods are the 
fastest). 

Additive SMM First, we fit the Additive SMM us¬ 
ing the gmm() function’s formula syntax for linear 
models. 

asmm <- gmm(data[, "y"] ~ data[,"x"], 
x=data[,c("zl","z2")l , vcov="iid") 
print(summary(asmm)) 

print(cbind(coef(asmm),confint(asmm))) 

# estimates 

print(specTest(asmm)) 

We can also pass the moment conditions to gmm () 
using its function syntax. In order to do this, we first 
define a function asmmMoments () which returns the 
ASMM moments. This function must have two argu¬ 
ments; the first of which theta denotes the vector 
of parameters to be estimated, where theta [1] is 
E(Y 0 ) and theta [2] is the causal risk difference. 
The second argument x is the data matrix; the user 
must avoid confusion here with the single variable X. 
In the gmm() function the tO option specifies the ini¬ 
tial values of the parameter estimates. After we have 
fitted the model with the call to gmm() we print out 
the model summary, then the estimates and their 
95% CIs, and finally the over-identification test us¬ 
ing specTest(). 

asmmMoments <- function(theta,x){ 

# extract variables from x 
Y <- x[, "y"] 

X <- x[,"x"] 

Zl <- x [, "zl 11 ] 

Z2 <- x[,"z2"] 

# moments 

ml <- (Y - theta[l] - theta[2]*X) 

m2 <- (Y - theta [1] - theta [2] *X)*Z1 

m3 <- (Y - theta [1] - theta [2] *X)*Z2 

return(cbind(ml,m2,m3)) 

} 

asmm2 <- gmm(asmmMoments, x=data, t0=c(0,0), 

vcov="iid") 

print(summary(asmm2)) 

print(cbind(coef(asmm2),confint(asmm2))) 

# estimates 

print(specTest(asmm2)) 

Multiplicative SMM We again use the gmm () func¬ 
tion syntax to fit the Multiplicative SMM. First we 
define the function msmmMoments () to return the 
moments. After fitting the model we print the model 
summary. Here theta [2] is the log causal risk ra¬ 
tio, and so we print the exponentiated form of this 
parameter. 
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msmmMoments <- function(theta,x){ 

# extract variables from x 

Y <- x [, "y"] 

X <- x[,"x"] 

Z1 <- x[,"zl"] 

Z2 <- x[,"z2"] 

# moments 

ml <- (Y*exp(- X*theta[2]) - theta[l]) 
m2 <- (Y*exp(- X*theta[2]) - theta[l])*Zl 
m3 <- (Y*exp(- X*theta[2]) - theta[l])*Z2 
return(cbind(ml,m2,m3)) 

> 

msmm <- gmm(msmmMoments, x=data, t0=c(0,0), 

vcov="iid") 

print(summary(msmm)) 

print(exp(cbind(coef(msmm), 

confint(msmm))[2,])) # causal risk ratio 

print(cbind(coef(msmm), confint(msmm))[1,]) 

# E[Y(0)] 

print(specTest(msmm)) 

We can also fit the alternative MSMM moments in 
the same way. Here theta [ 1 ] denotes log{i?(Yb)}, 
and so we print out the exponentiated form of both 
estimates: 

msmmAltMoments <- function(theta,x){ 

# extract variables from x 

Y <- x[, "y"] 

X <- x[,"x"] 

Zl <- x[,"zl"] 

Z2 <- x[,"z2"] 

# moments 

ml <- (Y*exp(-theta[l] - X*theta[2]) - 1) 

m2 <- (Y*exp(-theta[l] - X*theta[2]) - 1)*Z1 

m3 <- (Y*exp(-theta[l] - X*theta[2]) - 1)*Z2 

return(cbind(ml,m2,m3)) 

> 

msmm2 <- gmm(msmmAltMoments, x=data, 

t0=c(0,0), vcov="iid") 

print(exp(cbind(coef(msmm2), 

confint(msmm2)))) # exponentiate estimates 

print(specTest(msmm2)) 

Logistic SMM In estimation of the logistic SMM, 
especially with the joint moments, it is important to 
check that convergence has been reached, either by 
inspecting the model summary or checking that the 
model algolnf o$convergence attribute is equal to 
0. If convergence has not been reached, a higher it¬ 
eration limit (say, 5000) can be specified in gmm() 
through the option control=list(maxit=5000). 
Note in R qlogis(p) = log(p/(l —p)) and plogis (x) = 
expit(x) = e z /(l + e x ). 

First we fit the association model using the glm() 
function to fit the logistic regression. Again we col¬ 
lect the parameter estimates and predicted values. 


We then fit the causal model using the function 
cmMoments () to return its moment conditions. In 
this function theta[1] denotes E{Yq) andtheta[2] 
denotes the log causal odds ratio. 

In the joint estimation the function lsmmMom- 
ents() returns the moment conditions. In this func¬ 
tion theta[1:6] are the coefficients in the associa¬ 
tion model, theta[7] denotes E{Yq) and theta[8] 
denotes the log causal odds ratio. 

# association model 

am <- glm(y x + zl + z2 + x*zl + x*z2, 
as.data.frame(data), fam=binomial) 
print(summary(am)) 
amfit <- coef(am) 

xblog <- qlogis(fitted.values(am)) 

# causal model with incorrect SEs 
cmMoments <- function(theta,x){ 

# extract variables from x 
X <- x[,"x"] 

Zl <- x[j"zl"] 

Z2 <- x[,"z2"] 

# moments 

cl <- (plogis(xblog - theta [2] *X) 

- theta [1]) 

c2 <- (plogis(xblog - theta[2]*X) 

- theta[l])*Zl 

c3 <- (plogis(xblog - theta[2]*X) 

- theta[l])*Z2 
return(cbind(cl,c2,c3)) 

} 

cm <- gmm(cmMoments, x=data, t0=c(0,0), 

vcov="iid") 

cmfit <- coef(cm) 

IsmmMoments <- function(theta,x){ 

# extract variables from x 
Y <- x[, "y"] 

X <- x[,"x"] 

Zl <- x[j"zl"] 

Z2 <- x [,"z2"] 

XZ1 <- X*Z1 
XZ2 <- X*Z2 

# association model moments 

xb <- theta[l] + theta[2]*X + theta[3]*Zl 

+ theta[4]*Z2 + theta[5]*XZl 

+ theta[6]*XZ2 

al <- (Y - plogis(xb)) 

a2 <- (Y - plogis(xb))*X 

a3 <- (Y - plogis(xb))*Z1 

a4 <- (Y - plogis(xb))*Z2 

a5 <- (Y - plogis(xb))*XZ1 

a6 <- (Y - plogis(xb))*XZ2 

# causal model moments 

cl <- (plogis(xb - theta[8]*X) 

- theta [7]) 

c2 <- (plogis(xb - theta[8]*X) 
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- theta[7])*Zl 

c3 <- (plogis(xb - theta[8]*X) 

- theta[7])*Z2 

return(cbind(al,a2,a3,a4,a5,a6,cl,c2,c3)) 

> 

lsmm <- gmmOsmmMoments, x=data, 

tO=c(amfit,cmfit), 

vcov="iid") 

print(summary(lsmm)) 

print(cbind(coef(lsmm), confint(lsmm))[8]) 

# E[Y(0)] 

print(exp(cbind(coef(lsmm), 

confint(lsmm))[-7,])) # exponentiate other 

estimates 

print(specTest(lsmm)) 
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